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In the Claims:. 

Claims 1-34 are pending in this application, and the status of each is listed 

below* 

1 . (currently amended) A computer assisted method of auditing a superset of 
training data, the superset comprising examples of documents having one or more 
preexisting category assignments, the method including: 

partitioning the superset into at least two disjoint sets, including a test set and a 
training set, wherein the test set includes one or more test documents and the 
training set includes examples of documents belonging belong to at least two 
categories; 

automatically categorizing the test documents using the training set; 

calculating a metric of confidence based on results of the categorizing step and 
corapadiiq-the automatic- category assignments for the test documents to the 
preexisting category a ssignments: and 

reporting the tesLdocuments and preexisting category assignments that are 
suspicious and the automatic category assignments t hat appear to be missingjfcom 
the test documents , based on the metric of confidence- 

2. (original) The method of claim 1 , further including repeating the 
partitioning, categorizing and calculating steps until at least one-half of the 
documents in the superset have been assigned to the test set. 

3. (original) The method of claim 2, wherein the test set created in the 
partition step has a single test document. 

4.. (original) The method of claim 2, wherein the test set created in the 
partition step has a plurality of test documents. 

5. (original) The method of claim 1 , further including repeating the 
partitioning, categorizing and calculating steps until substantially all of the 
documents in the superset have been assigned to the test set 
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6,. (original) The method of claim 1, wherein the partitioning, categorizing 
and calculating steps are carried out substantially without user intervention; 

7. (original) The method of claim 5, wherein the partitioning, categorizing 
and calculating steps are carried out substantially without user intervention 

8. (original) The method of claim 1 , wherein the partitioning, 
categorizing, calculating and reporting steps are carried out substantially without 
user intervention. 

9„ (original) The method of claim 5, wherein the partitioning, 
categorizing, calculating and reporting steps are carried out substantially without 
user intervention. 

10.. (original) The method of claim 1 p wherein the categorizing step includes 
determining k nearest neighbors of the test documents and the calculating step is 
based on a k nearest neighbors categorization logic. 

1 1 m (currently amended) The method of claim 10, wherein the metric of 
confidence is an unweighted measure of distance between the a particular test 
document and the examples of documents belonging to various categories. 

12- (currently amended) The method of claim 11, where the unweighted 
measure includes application of a relationship Ci 0 (d t ,T m ) = £ j(d t ,d), 

wherein 

Ci Q is a function of the particular test document represented by the a feature 
vector d t and of various categories T m \ and 

s is a metric of distance between the particular test document feature vector 
d t and certain sample documents represented by feature vectors d, the 
certain sample documents being among a set of k nearest neighbors of the 
p articular test document having category assignments to the various 
categories T m . 

13- (currently amended) The method of claim 10, wherein the metric of 
confidence is a weighted measure of distance between the a particular test 
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document and the examples of documents belonging to various categories, the 
weighted measure taking into account the density of a neighborhood of the test 
document, 

14. (currently amended) The method of claim 1 3. wherein whoro the 
weighted measure includes application of a relationship 

&i(W a ) = M A - > wherein 

Qi is a function of the test document represented by the a feature vector dt 
and of various categories T m \ and 

s is a metric of distance between the test document feature vector d t and 
certain sample documents represented by feature vectors di and d2, the 
certain sample documents di being among a set of k nearest neighbors of 
the test document having category assignments to the various categories T m 
and the certain sample documents d2 being among a set of k nearest 
neighbors of the test document. 

15. (currently amended) The method of claim 1, wherein the id e ntifying 
reporting step further includes filtering the test documents based on the metric of 
confidence, 

16. (currently amended) The method of claim 15, wherein the filtering step 
further includes color coding the identified test documents based on the metric of 
confidence. 

1 7. (currently amended) The method of claim 1 5, wherein the filtering step 
further includes selecting for display the identified test documents based on the 
metric of confidence. 

18. (original) The method of claim 1 , wherein tho uoor interface io 
^porting includes generating a printed report. 

1 9. (original) The method of claim 1 , wherein tho ucor interface Is 
reporting includes generating a file conforming to XML syntax.. 
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20. (original) The method of claim 1 , wherein the^se r i nterfac e i s 
reporting includes generating a sorted display identifying at least a portion of the 
test documents. 

21 ., (original) The method of claim 1 , further including calculating a 
precision score for the identified test documents.. 

22.. (currently amended) A computer assisted method of auditing a superset 
of training data, the superset comprising examples of documents having one or 
more preexisting category assignments, the method including: 

determining k nearest neighbors of the documents in a test subset 
autQmati(^llv,p.aititlQned J tonLt he superset; 

automatically categorizing the documents based on the k nearest neighbors 
into a plurality of categories; 

calculating a metric of confidence based on results of the categorizing step 
and comparing t he automatic category assignments for the documentsjojhe 
preexisting category assignments : and 

reporting the documents in the test subset and preexisting category 
assignments that are suspicious and the automatic category assignments 
that appear to be missing from the documents in the test subset , based on 
the metric of confidence. 

23., (original) The method of claim 22, wherein the metric of confidence is 
an unweighted measure of distance between the test document and the 
examples of documents belonging to various categories 
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24 (original) The method of claim 23, where the unweighted measure includes 
application of a relationship n Q (d ti T m ) = £ j(d t ,d), wherein 

fio is a function of the test document represented by the a feature vector d t and of 
various categories T m \ and 

s is a metric of distance between the test document feature vector d t and certain 
sample documents represented by feature vectors d, the certain sample documents 
being among a set of k nearest neighbors of the test document having category 
assignments to the various categories T m . 

25. (original) The method of claim 22, wherein the metric of confidence is a 
weighted measure of distance between the test document and the examples of 
documents belonging to various categories, the weighted measure taking into account 
the density of a neighborhood of the test document, 

26. (original) The method of claim 25, wherein the weighted measure includes 
application of a relationship Q l (d t J m ) = ***y r -> , wherein 

flh is a function of the test document represented by the a feature vector d t and of 
various categories T m \ and 

s is a metric of distance between the test document feature vector d t and certain 
sample documents represented by feature vectors di and d*. the certain sample 
documents di being among a set of k nearest neighbors of the test document 
having category assignments to the various categories T m and the certain sample 
documents d2 being among a set of k nearest neighbors of the test document. 

27. (original) The method of claim 22, wherein the determining, categorizing and 
calculating steps are carried out substantially without user intervention 

28. (currently amended) The method of claim 22, wherein the id e ntifying reporting 
step further includes filtering the documents based on the metric of confidence. 
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29. (original) The method of claim 28, wherein the tittering step further includes 
color coding the identified reported documents based on the metric of confidence. 

30- (original) The method of claim 28, wherein the filtering step further includes 
selecting for display the identified £gQg!3&4 documents based on the metric of 
confidence. 

31. (original) The method of claim 22, wherein the user intcrfa ee-fe reporting includes 
generating a printed report. 

32. (original) The method of claim 22, wherein th e us e r interfac e is reporting includes 
generating a file conforming to XML syntax. 

33. (original) The method of claim 22. wherein th e us e r interface is reporting includes 
generating a sorted display identifying at least a portion of the documents 

34.. (original) The method of claim 22, further including calculating a precision 
score for the identified reported documents, 
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